margin condition
sup
A.1 Notation In this appendix, we use the notation dฯt(,) to indicate the state-action visitation measure induced by the policy ฯ at time t. We overload the notation dฯt() to denote the state-visitation measure induced by the policy ฯ at time t. Likewise, the notations dDt (,) and dDt () indicate the empirical visitation measures in the dataset D. For a function g: X R, the norm kgk, supx X |g(x)|. Before discussing the proofs of the results, we also explain the instantiation of the function class in the tabular setting below. A.2 Imitation gap upper bound on empirical moment matching (Theorem 3.1) Below we restate Theorem 3.1 and provide a proof of this result. The key observation is that since the learner ฯMM best matches the empirical distribution in the dataset, which is in turn close to the population visitation measure induced by ฯE, we can expect the visitation measure induced by ฯE and ฯMM to be close. This in turns implies that both policies will collect a similar value under any reward function. Precisely characterizing the rates at which these distributions converge to one another results in the final bound. Consider the empirical moment matching learner ฯMM (eq. TV dฯt,dDt (20) where the equation follows by the variational definition of the total variation distance, and where dฯt is the state-action visitation measure induced by ฯE and dDt is the empirical state-action visitation measure in the dataset D. The imitation gap of this policy can be upper bounded by, J(ฯE) J(ฯMM) = EฯE "H This goes to show that in the tabular setting, MMis equivalent to finding the policy which best matches (in TV-distance) the empirical state-action distribution observed in the dataset.
Do More Predictions Improve Statistical Inference? Filtered Prediction-Powered Inference
Recent advances in artificial intelligence have enabled the generation of large-scale, low-cost predictions with increasingly high fidelity. As a result, the primary challenge in statistical inference has shifted from data scarcity to data reliability. Prediction-powered inference methods seek to exploit such predictions to improve efficiency when labeled data are limited. However, existing approaches implicitly adopt a use-all philosophy, under which incorporating more predictions is presumed to improve inference. When prediction quality is heterogeneous, this assumption can fail, and indiscriminate use of unlabeled data may dilute informative signals and degrade inferential accuracy. In this paper, we propose Filtered Prediction-Powered Inference (FPPI), a framework that selectively incorporates predictions by identifying a data-adaptive filtered region in which predictions are informative for inference. We show that this region can be consistently estimated under a margin condition, achieving fast rates of convergence. By restricting the prediction-powered correction to the estimated filtered region, FPPI adaptively mitigates the impact of biased or noisy predictions. We establish that FPPI attains strictly improved asymptotic efficiency compared with existing prediction-powered inference methods. Numerical studies and a real-data application to large language model evaluation demonstrate that FPPI substantially reduces reliance on expensive labels by selectively leveraging reliable predictions, yielding accurate inference even in the presence of heterogeneous prediction quality.
Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving
Sonoda, Sho, Akiyama, Shunta, Uezato, Yuya
We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $ฮฉ(ฮ^D)$ while the cut-aware hierarchical process has size $O(ฮป^D)$ with $ฮป\llฮ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper looks at differentially private algorithms for a generic maximization problem (private argmax might be a good name). Given a collection of K of items, and a data set D of n individuals, and a score function f that assigns each item i a data-based score f(i;D), the goal is to find an item i with approximately maximal score, while preserving differential privacy. This private argmax has proven to be a fundamental problem in the theory of private data analysis. It was first formulated by McSherry and Talwar (2007), who proposed the exponential mechanism to solve it.
Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric transformation of the Bayes decision boundary, our method reformulates the problem as a low-dimensional empirical risk minimization problem. Under mild regularity conditions, we establish the consistency of our estimators and derive the risk bounds. Moreover, we illustrate the broad applicability of our method by adapting it to the estimation of optimal individualized treatment rules. Extensive simulation studies and analyses of real-world data further demonstrate both superior performance and robustness of our approach.